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Abstract 

To obtain more information on the Hevea brasiliensis genome, we sequenced the transcriptome from 
the vegetative shoot apex yielding 2 311 497 reads. Clustering and assembly of the reads produced a 
total of 113 313 unique sequences, comprising 28 387 isotigs and 84 926 singletons. Also, 17 819 
expressed sequence tag (EST)-simple sequence repeats (SSRs) were identified from the data set. To demon- 
strate the use of this EST resource for marker development, primers were designed for 430 of the EST- 
SSRs. Three hundred and twenty-three primer pairs were amplifiable in H. brasiliensis clones. 
Polymorphic information content values of selected 47 SSRs among 20 H. brasiliensis clones ranged 
from 0.13 to 0.71, with an average of 0.51. A dendrogram of genetic similarities between the 20 H. 
brasiliensis clones using these 47 EST-SSRs suggested two distinct groups that correlated well with 
clone pedigree. These novel EST-SSRs together with the published SSRs were used for the construction 
of an integrated parental linkage map of H. brasiliensis based on 81 lines of an F1 mapping population. 
The map consisted of 97 loci, consisting of 3 7 novel EST-SSRs and 60 published SSRs, distributed on 
23 linkage groups and covered 842.9 cM with a mean interval of 11.9cM and ~4 loci per linkage 
group. Although the numbers of linkage groups exceed the haploid number (18), but with several 
common markers between homologous linkage groups with the previous map indicated that the F1 
map in this study is appropriate for further study in marker-assisted selection. 

Key words: transcriptome sequencing; marker development; rubber tree (Hevea brasiliensis); linkage map 
construction 



1. Introduction 

Hevea brasiliensis, commonly known as rubbertree, is 
almost the sole source of natural rubber production. 
Natural rubber has a wide range of industrial appli- 
cations and is under increasing global demand. Hevea 
brasiliensis is a perennial cross-pollinating and 



monoecious plant that belongs to the Euphorbiaceae 
family. The observation of tetravalents during meiosis 
has lead to the conclusion that H. brasiliensis is a stabil- 
ized amphidiploid (2/7 = 4x = 36). 1 However, the 
pattern of marker ratios segregating in a population 
of over 1 00 trees suggests that H. brasiliensis behaves 
as a diploid (2n = 36). 2 Several research groups have 
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developed molecular markers to study the genetic 
diversity of H. brasiliensis, 3 ' 8 including isozymes, 9 
restriction fragment length polymorphisms (RFLPs), 10 
amplified fragment length polymorphisms (AFLPs), 2 
microsatellites (or simple sequence repeats, 
SSRs) 2,8,1 1,12 and expressed sequence tag (EST)-SSRs. 3 
These markers have also been used to construct the 
linkage maps 2 and quantitative trait loci (QTL) 
maps. 13-16 Lespinasse et al) 4 produced the first 
rubber tree linkage map containing 301 RFLPs, 388 
AFLPs, 1 8 SSRs and 1 0 isozymes, which was later used 
to identify the QTL variants conferring resistance to 
the South American leaf blight. 13-15 Recently, Le 
Guen et ai) b constructed the linkage maps based on 
SSR and AFLP markers and were able to identify the 
QTL conferring resistance to Microcyclus ulei. 

In many organisms, ESTs have been useful for the 
annotation of genes during genome sequencing 
efforts, 17 for comparative genome studies 18 and for 
the production of a genetic linkage map. 19 To date, 
there are only 1 2 365 ESTs from H. brasiliensis in 
GenBank, restricting the quality of research that can 
be performed on this important plant species. 
Previous transcriptome studies of H. brasiliensis have 
been limited in range, focusing mainly on latex in 
order to gain insight into the rubber biosynthesis 
pathways. Of the available H. brasiliensis ESTs, 
1 1 256 ESTs are from latex, 1091 ESTs are from 
bark and 18 ESTs are from leaves. In addition to 
gene discovery, EST resources enable the identification 
of markers such as EST-SSRs and single-nucleotide 
polymorphism. Since these markers are directly 
linked to functional genes, they are useful for asses- 
sing genetic diversity and mapping phenotypic traits. 
Feng et al. 3 identified 799 SSRs in 1 0 829 ESTs avail- 
able in the GenBank database and carried out the 
genetic diversity assessment of H. brasiliensis using 
87 EST-SSR markers. The result provided evidence 
for cross-taxa transferability and indicated moderate 
polymorphisms of EST-SSR markers in Hevea 
species. 3 However, additional markers are desirable 
to enable quality research into the genetic basis of 
commercially relevant traits that can be used in 
marker-assisted breeding programs. 

Genomic and transcriptomic resources for H. 
brasiliensis can greatly benefit from the application 
of the recent high-throughput sequencing technol- 
ogy, such as the 454 pyrosequencer, 23 which has 
been instrumental in the development of genetic 
databases for several economical crops. 24-27 The 
purpose of the present study is therefore to sequence 
the transcriptome of the shoot apical tissue, which is a 
highly dynamic structure, to discover genes, expand 
the EST database and develop EST-SSR markers that 
can be used for assessing genetic diversity, 



constructing linkage maps and identifying traits of 
commercial interest. 

2. Materials and methods 

2.1. Plant materials 

The shoot apical meristem (SAM; 1-cm long from the 
vegetative shoot apex) of H. brasiliensis (clone 
RRIM600) was collected for RNA extraction from an 
experimental field at the Rubber Research Institute of 
Thailand, Ministry of Agriculture and Cooperatives, 
Thailand. The sample was immediately frozen in 
liquid nitrogen and stored at -80°C until RNA extrac- 
tion. For the analysis of SSR markers, leaf samples 
from 20 clones of H. brasiliensis, 2 accessions of 
Manihot esculenta, 3 accessions of Jatropha curcas and 
1 accession of Jatropha gossypifolia (Supplementary 
data, File S1) were collected and DNA was extracted 
using a DNeasy Plant Mini Kit (Qiagen). For genetic 
linkage map construction, 81 samples of an F, 
mapping population were developed from a cross 
between RRIM600 as a female parent and RRII1 05 as 
a male parent. The plants were grown at 
Chachoengsao Rubber Research Center Office of 
Agriculture Research and Development, Department 
of Agriculture, Ministry of Agriculture and 
Cooperatives, Thailand. DNA samples were extracted 
using a DNeasy Plant Mini Kit (Qiagen). The concen- 
tration of each sample was calculated from the OD 
measurement using a nanodrop ND1 000 (NanoDrop 
Technologies). 

2.2. cDNA library preparation and sequencing 

Total RNA was extracted using Concert™ Plant RNA 
Reagent (Invitrogen). Two hundred nanograms of the 
poly-A mRNA sample was isolated using an Absolutely 
mRNA Purification Kit (Strata gene) and fragmented in 
10x fragmentation buffer (0.1 M ZnCl 2 , 0.1 M Tris- 
HCI, pH 7.0) at 70°C for 30 s. The reaction was 
stopped by adding 2 |xl of 0.5 M EDTA and 28 jjlI of 
1 0 mM Tris-HCl, pH 7.5. The mRNA sample was 
cleaned using Agencourt RNAClean reagent 
(Beckman Coulter), washed with 200 |xl of 70% 
EtOH, air dried and eluted in 20 (J of 1 0 mM Tris- 
HCl, pH 7.5. Fragmented mRNA samples were con- 
verted to double-stranded cDNA with the cDNA 
Synthesis System Kit (Roche Applied Sciences) using 
random primers and AMV Reverse Transcriptase. A 
cDNA library for 454 pyrosequencing was prepared 
according to the October 2008 version of the cDNA 
Rapid Library Preparation protocol (Roche Applied 
Sciences). The cDNA library was amplified in emulsion 
PCR and subject to pyrosequencing on two full picoti- 
ter plates of the Genome Sequencer (GS) FLX 
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Titanium platform using the October 2 008 version of 
the titanium chemistry protocol (Roche Applied 
Sciences). 

2.3. Sequence analysis 

Poly-A/T(1 8) and 454-adapter sequences were 
trimmed off. Sequence reads with low quality 
(average quality scores <2 0), short reads 
(<100bp), rRNAs and tRNAs were removed. 
Sequence assembly was performed using the cDNA 
option of Newbler 2.5, the de novo sequence assembly 
software. The cDNA option assembles reads into 
contigs much like a genomic assembly; however, this 
option allows/expects contigs to have multiple 
joining contigs representing alternately spliced 
genes. These contigs (which essentially represent 
exons) are then assembled into isotigs (representing 
processed mRNA) and isotigs utilizing the overlapping 
subsets of contigs are grouped into isogroups (repre- 
senting genes, isoforms or gene families). When the 
maximum number of contigs in an isogroup is 
exceeded, the contigs are output as un-traversed 
contigs in the isotigs file; all further mention of 
isotigs will include these un-traversed contigs. 
Unique sequences were searched for sequence hom- 
ology against the Uniprot plant protein database 
(www.uniprot.org); and reference protein sequences 
of Manihot, Ricinus, Arabidopsis and Oryza (www. 
phytozome.net) using the BLASTx program with a 
cutoff at E-6. 28 The assignment of functionality via 
gene ontology (GO) was performed using Blast2G0. 29 

The MISA-Mlcro SAtellite identification tool 
(http://pgrc.ipk-gatersleben.de/misa/misa.html) was 
used to search for SSR from the EST data set. For the 
searches and comparison of microsatellites, SSRs 
were defined as being mononucleotide repeats 
(MNRs) >10 repeats and di- (DNRs), tri- (TNRs), 
tetra- (TTNRs), penta- and hexanucleotide repeats 
>6 repeats; criteria for composite SSRs was an interval 
of bases < 1 00. For the purpose of marker evaluation, 
we increased stringency to reduce the number of can- 
didates. We designed primer pairs overlapping DNRs 
and TNRs >8, TTNRs >7, pentanucleotide repeats 
(and more) >6 or containing complex SSRs >30 
nucleotides. In some cases, candidate SSRs that 
passed the criteria suggested by Feng et al 3 were 
prioritized based on the presence of motif size poly- 
morphisms in the sequence alignment results. 

2.4. SSR markers from the previous reports 

Primer pairs designed for the amplification of 
genomic SSR markers from the NCBI database 
(AY486558.1 -AY48691 0.1) and previously 
described 8 were used to construct a linkage map 
together with novel EST-SSR markers. 



2.5. SSR analysis 

DNA samples were extracted from young leaf tissue 
using a DNeasy Plant Mini Kit (Qiagen). Primer pairs 
were designed to amplify SSR regions using 
PRIMER3. 30 PCR was carried out in a total volume of 
1 0 |xl containing 2 ng of DNA template, 1 x Taq 
buffer, 2 mM MgCl 2 , 0.2 mM dNTPs, 1 U Taq-DNA 
polymerase (Fermentas) and 0.5 |xM each of 
forward and reverse primers. Amplification was per- 
formed in a GeneAmp PCR 9700 thermocycler 
(Applied Biosystems) programmed as follows: 94°C 
for 2 min followed by 35 cycles of 94°C for 30 s, 
52°C for 30 s, 72°C for 1 min and a final extension 
step at 72°C for 1 0 min. Amplified products were sep- 
arated on 5% denaturing polyacrylamide gels and 
visualized by silver staining. 

2.6. Analysis of polymorphic loci 

Twenty accessions of H. brasiliensis, as listed in 
Supplementary data, File S1 , were used for the poly- 
morphism analysis of SSR markers. Details of primer 
pairs for amplifiable EST-SSR markers are listed in 
Supplementary data, File S2. Scored data from poly- 
morphic loci were used to calculate the polymorphism 
information content (PIC). 31 Observed heterozygosity 
and expected heterozygosity were calculated using 
the PowerMarker 3.2 5 software. 32 The cross-taxa 
transferability of H. brasiliensis SSR loci were evaluated 
using six other taxa of Euphorbiaceae plants, including 
two accessions of M. esculenta, three accessions of J. 
curcas and one J. gossypifolia (Supplementary File S1). 
The percentage of transferability was calculated for 
each taxon by dividing the number of successfully 
amplified SSR loci by the total number of loci 
analysed. A genetic similarity matrix was prepared for 
the 2 0 H. brasiliensis genotypes at 47 EST-SSR loci 
(Supplementary File S3) using the NTSYSpc 2.2 soft- 
ware. 33 UPGMA (un-weighted pair group method 
with arithmetic mean) cluster analysis was conducted 
usizSpc 2.2 software. 33 

2.7. Linkage map construction 

Eighty-one of H. brasiliensis progenies derived from 
across between RRIM600 and RRII105 were used as 
mapping population. Genomic DNA of individual 
samples was used to genotype with informative 
primers and genotypic data were scored as codomi- 
nant markers under the cross-pollination model, e.g. 
<abxcd>, <efxeg>, <lmxll>, <nnxnp> and 
<hkxhk>) with up to four distinguishable alleles as 
described by Van Ooijen and Voorrips. 34 The inte- 
grated parental genetic linkage map was constructed 
using the double pseudo-testcross strategy by 
JoinMap 3.0. 34 The Mendelian segregation ration of 
all markers was evaluated using the chi-square test 
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Table 1. Isotig and singleton sequence length distribution 



Sequence length (bp) 


Number of singleton 


Number of isotig 


101-500 


83 630 


2670 


501-1000 


1296 


9553 


1 001 -1 500 


0 


701 8 


1 501 -2000 


0 


4371 


2001 -2500 


0 


2271 


2501 -3000 


0 


1 306 


>3000 


0 


1 198 


Total 


84 926 


28 387 



(x 2 ) and distorted markers (P<0.1) were excluded. 
The map was constructed with an LOD score 
threshold of 3.0 and the mapping parameters were 
set with a recombination threshold of 0.4, a jump 
threshold of 5.0 and a minimum LOD score threshold 
of 1.0. The map distance between two markers in 
centiMorgan (cM) was calculated using Kosambi's 
mapping function. 35 The linkage map was drawn 
using MapChart 2. 2. 36 

3. Results and discussion 

3.1 . Transcriptome sequencing of rubber tree 

A total of 2 31 1 49 7 filtered sequence reads were 
generated from the vegetative shoot apical tissue 
with an average read length of 294 bases totalling 
676.5 Mb. The majority of the reads were in the 
range 200-400 bp (Fig. 1). All reads were deposited 
in DDBJ Read Archive (ID = DRA0001 70). There 
were 1 91 369 (8.27%) reads with homology to 
plant tRNAs and rRNAs and 1 39 838 (6.04%) reads 
shorter than 100 bp which were removed before 



sequence assembly. Raw sequencing reads were 
assembled by Newbler 2.5, the de novo sequence 
assembly software, 23 currently the most robust soft- 
ware for 454 transcriptome assembly. 37 A total of 
28 387 isotigs from 19 152 isogroups and 84 926 
singletons were obtained from the assembly. An iso- 
group theoretically represents a single gene; 
however, genes with high sequence similarity may 
be grouped together and therefore isogroups may 
also represent gene isoforms or gene families. The 
isotigs had an average length of 1326 bp and the 
majority of isotigs were between 500 and 1000 bp 
(Table 1). The largest isotig (isotig08423) was 
9041 bp, which showed sequence similarity to the 
M. esculenta chloroplast polycistronic transcript psaA- 
psaB (YP_001 71 8437.1 ; £-value = 0). The most 
highly represented isotig (isotig00002) was 
assembled from 22 641 reads and the highest blast 
match was the Hevea hydroxynitrile lyase Chain A 
(AAC49184.1; £-value = 1 E-59), reflecting the high 
level of cyanogenesis activity in the tissue sample. It 
is well recognized that all living tissues of H. 
brasiliensis, including seeds, are strongly cyanogenic 
accumulating high quantities of cyanogenic precur- 
sors such as linamarin and lotaustralin. 38 The 
average GC content of the H. brasiliensis transcrip- 
tome generated in this study was 42.16%, which is 
similar to the GC content of H. brasiliensis sequences 
in the GenBank EST database (42.18%). The GC 
content of H. brasiliensis coding sequences is slightly 
lower than the average GC content of Arabidopsis 
coding sequences (44.5%) and rice coding sequences 
(51. 5%); 39 but much higher than Arabidopsis inter- 
genic regions (32.9%; http//gi. kuicr.kyoto-u.ac.jp). All 
unique transcripts were annotated and characterized 
according to GO using BLAST2G0 29 and the result is 
available at http://www4a.biotec.or.th/rubber. 
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In total, 61 62 5 isotigs and singletons were 
assigned one or more GO terms. Under the biological 
process domain, 71 071 assignments were made, 
with a large proportion of assignments falling into 
the categories metabolic process (31.07%) and cellu- 
lar process (29.94%). A total of 60 927 assignments 
were made to the molecular function domain, with 
the majority falling into the categories binding 
activity (46.52%) and catalytic activity (39.82%). 
This distribution of GO terms is similar to the previous 
study in the pea SAM transcriptome sequencing. 40 
The large number of annotated sequences shows 
that ESTs generated by high-throughput sequencing 
are more likely to represent mRNA than ESTs gener- 
ated by lower-throughput methods, such as previous 
studies performed on the latex transcriptome. 20-22 
The reason for this is that high-throughput sequen- 
cing generates sequence data supported by multiple 
reads, often representing complete mRNAs. The 
majority of EST sequences currently in GenBank 
from Chow et al. 20 are mostly un-annotated single- 
tons with only 904 sequences (26%) that have GO 
terms assigned. Also, the diversity of the latex tran- 
scriptome is limited as Han et al. 22 pointed out that 
the genes expressed in latex are mainly associated 
with rubber biosynthesis pathways, defence mechan- 
isms and allergenic proteins. A larger diversity of 
genes was considered more likely to be found in the 
vegetative shoot apex tissue than in latex, and that is 
what has been found here. 

A unique characteristic of the shoot apical tissue is 
the maintenance of the SAM via intercellular com- 
munication involving a complex signalling network 
such as epigenetic control, 41,42 transcriptional gene 
regulation 43-45 and hormonal regulation. 46 Key 
regulatory genes controlling SAM maintenance, such 
as WUSCHEL (WUS) and SHOOT MERISTEMLESS 
(STM) genes, were identified as putative full-length 
cDNAs in this data set. Recent studies revealed that 
WUS plays the key role in SAM maintenance through 
the regulatory loop of WUS-CLAVATA (CLV) feed- 
back 41 > 47,48 and interacts with STM via phytohor- 
mone signalling pathways. 49 Furthermore, the 
regulation of WUS expression is also controlled by 
auxin signalling, chromatin remodelling and positive 
and negative transcriptional regulators. 50 Transcripts 
for a positive transcriptional regulator of WUS, such 
as APETELA2, 45 SPLAYED and BARD 1 , which bind to 
the WUS promoter sequence 43 were found in this 
data set. Whereas, WUS negative transcriptional regu- 
lators which are required for development of floral 
organs, such as ULTRAPETALA 1 5 1 and HANABA 
TARANU 52 transcripts, were not detected. 

Also identified were genes from the KNOX (Knotted- 
like homeobox) family such as STM and KNOTTED-LIKE 
FROM ARABIDOPSIS THALIANA (KNAT) genes which are 



essential in maintaining the balance between organ 
primordia growth and stem cell maintenance in the 
SAM. 53 KNOX transcription factors have roles in sup- 
pressing gibberellins (GAs) in the SAM by inhibiting 
GA-20 oxidase, which is required for GA biosynthesis, 
and promoting GA-2 oxidase, which inactivates the 
active GA. 54 Thus, KNOX proteins prevent the 
accumulation of GA in the central zone of the SAM, 
consequently preventing the differentiation of stem 
cells. KNOX proteins also promote cytokinin activity 
in the SAM central zone stimulating division and 
maintenance of undifferentiated stem cells. 46 
Flanking the SAM, high levels of auxin and GA activi- 
ties have roles in development and growth of lateral 
organ primordia. 46 The tissue sample in this study 
contained both undifferentiated meristem and differ- 
entiated organ primordia; therefore, it was likely to 
identify transcripts of genes involving in many phyto- 
hormone biosynthesis and signalling pathways. 

3.2. Similarity of rubber tree ESTs to other plant 
proteins 

To investigate the efficiency of gene discovery in the 
H. brasiliensis transcriptome, the isotigs and single- 
tons were searched for homology using BLAST 28 
against other plant reference sequences such as M. 
esculenta (Euphorbiaceae, Rosids), Ricinus communis 
(Euphorbiaceae, Rosids), Arabidopsis thaliana 
(Brassicaceae, Rosids) and Oryza sativa (Poaceae, 
Liliopsida). The majority of H. brasiliensis unigenes 
matched against proteins from Manihot (1 02 936 or 
48.1%), followed by Ricinus (97 089 or 45.4%), 
Arabidopsis (84 643 or 39.5%), then Oryza (77 805 
or 36.3%) as shown in Fig. 2. Hevea, Manihot and 
Ricinus are phylogenetically related and grouped 
together in the Euphorbiaceae family; therefore, it 
was expected that a large number of H. brasiliensis 
isotigs and singletons would match proteins from 
Manihot and Ricinus. These observations agree with 
previous studies on cross transferability of EST 
markers which demonstrated a high level of genome 
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Manihot Ricinus Arabidopsis Oryza sativa 

esculenta communis thaliana 



Figure 2. Homology results of H. brasiliensis isotigs and singletons 
that matched to proteins in the plant reference databases 
{Manihot esculenta, Ricinus communis, Arabidopsis thaliana and 
Oryza sativa) using BLASTx. 
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Table 2. Distribution of identified SSRs using the MISA software according to SSR motif types and repeat numbers 



Repeats Number of repeat units 





6 


7 


8 


9 


10 


1 1 


1 2 


1 3 


1 4 


1 5 


>1 5 


Total 


MNR 


N/A 


N/A 


N/A 


N/A 


2999 


191 5 


1435 


1 1 27 


81 3 


688 


4393 


1 2 682 


DNR 


991 


535 


414 


307 


237 


200 


1 23 


1 32 


1 10 


94 


315 


3458 


TNR 


700 


292 


1 92 


1 1 4 


1 03 


45 


42 


31 


1 4 


1 5 


45 


1 593 


TTNR 


33 


1 0 


5 


2 


1 


0 


0 


0 


0 


0 


0 


51 


>PTNR 


24 


5 


4 


0 


2 


0 


0 


0 


0 


0 


0 


35 



MNR, mononucleotide repeat; DNR, dinucleotide repeat; TNR, trinucleotide repeat; TTNR, tetra nucleotide repeat; PTNR, 
penta nucleotide repeat. 



conservation among plants in Euphorbiaceae, 
especially Hevea and Manihot. 3,55,56 However, it 
should also be noted that sequence homology analy- 
sis by BLAST can be biased by the number and quality 
of query database and the reference databases. 

3.3. EST-SSR: distribution and frequencies 

A total of 17 819 SSRs were identified in the isotigsand 
singletons from the H. brasiliensis transcriptome 
(Table 2). This represents an average frequency of one 
EST-SSR in every 3383 bp, which is a lower frequency 
than the previous report (1 SSR per 2.2 5 kb) by Feng 
et al. 3 Among plant species, SSR frequencies range from 
1 per 1 .5 kb in coffee 57 to 1 per 67 kb in mungbean. 25 
The distribution and frequencies of EST-SSRs significantly 
vary between different studies due to SSR search criteria, 
the size of the EST database and software tools, 58 so the 
MISA software was used with the same criteria as the pre- 
vious study on H. brasiliensis EST-SSRs 3 to allow for direct 
comparability. From 17 819 SSRs identified, there were 
1 2 682 MNRs, 3458 DNRs, 1 593 TNRs, 51 TTNRs and 
35 SSRs with pentanucleotide repeats or more. It 
should be noted that the number of MNRs may not be 
accurate due to the limitations of the 454 technology 
in reading long homopolymer sequences. MNRs were 
described here for the purpose of comparison with pre- 
vious studies but they were not used for polymorphism 
analysis. The most common type of DNR was AG/CT 
which accounted for 64.43% of the repeats, followed by 
AT/TA (26.75%), AC/GT (8.48%) and GC/CG (0.3%). 
The most common type of TNR was AAG/CTT 
(32.53%), followed by AAT/ATT (19.17%) and ACC/ 
GGT (1 8.33%). The least frequent DNR and TNR motif 
types were GC-rich motifs, which were found at only 33 
loci. SSRs with GC-rich motif repeats are rare in many 
plants, such as H. brasiliensis, 3 rice, corn, soybean, 59 
wheat, 60 Arabidopsis, apricot, peach 61 and coffee. 57 

3.4. Polymorphism test in EST markers 

Four hundred and thirty EST-SSRs were selected from 
EST-SSR present in the data set to give a range of repeat 
units and motif types, and primers were designed to 



amplify them. Based on the sequence homology 
search, 16 of 430 (3.7%) SSR primer pairs were 
mapped to the EST reads with GenBank accession 
numbers reported for the development of EST-SSR 
markers by Feng et al. 3 The primary PCR screening in 
three clones of H. brasiliensis (RRIM600, RRIC110 
and BPM24) showed that 323 primer pairs (75.1 1 %) 
were amplifiable (primer sequences were listed in 
Supplementary File S2). We selected 47 primer pairs 
that flanked long SSR motifs (criteria mentioned 
above), gave clear PCR bands in the primary amplifica- 
tion screening and represented in all classes of nucleo- 
tide repeats for polymorphism evaluation among 20 
different H. brasiliensis clones (Table 3). The number 
of alleles observed at each locus ranged from two to 
six, with an average of 3.85. Although H. brasiliensis is 
believed to be a stabilized amphidiploid, only five SSRs 
(EHB61, EHB100, EHB1 09, EHB1 1 5 and EHB144) 
gave more than two alleles supporting the report by 
Lespinasse et al.^ 4 that H. brasiliensis behaves as a 
diploid. The value of expected heterozygosity varied 
from 0.1 349 to 0.7494 with an average of 0.5594, 
while direct count heterozygosity ranged from 
0.1429 to 0.8095 with an average of 0.5076. PIC 
values ranged from 0.13 to 0.71 with an average of 
0.50, which is higher than the previous report (PIC = 
0.38) by Feng et al. 3 The higher PIC value in this study 
was probably due to a larger sample size of H. brasilien- 
sis clones tested (n = 20) than the previous study (n = 
1 2). The average number of alleles and PIC values of 
EST-SSR markers are lower than those of genomic SSR 
markers, 8 as expected for functional sequences. 

The 47 polymorphic EST-SSR markers were used to 
evaluate the genetic relatedness among 20 different 
H. brasiliensis clones which were classified into two 
groups at the level of genetic similarity 0.44 (Fig. 3). 
This generally corresponded well to the clone 
pedigree (Supplementary File S1). The first group 
(Group I) contains clones: PB260, PB310, RRIM605, 
PB217, PB5/51 and PB235. The majority of these 
clones have PB5/51 or PB49 clones as one of the par- 
ental lines. Group II contains a mixture of primary 
clones and cultivated clones from various rubber 



Table 3. Characteristics of the 47 primer pairs targeting polymorphic microsatellite loci analysed in 20 different clones of H. brasiliensis. 



Marker 


Forward sequence 


Reverse sequence 


Repeat motif 
and count 


PCR product 
size (bp) 


Number of 
alleles 


Expected 
heterozygosity 


Observed 
heterozygosity 


PIC 


EHB01 2 


AAG ATTG AACTAG G GTTG AACTGG 


CCAAATGTTCATTTAATTGTGGA 


(CAA) 8, (TAA) 
6 


250 


-300 


5 


0.6746 


0.3333 


0.61 86 


EHB01 3 


AAG CAAG G AAG AGG AAGG G A 


CAAGAAGTTGCCCAI I 1 1CA 


(1 1 1 IA) 8 


225 


— 255 


3 


0.4206 


0.5238 


0.3824 


EHB025 


ACCGTCCACCATAACCACAT 


AAAGGCCATGCCTACATTTG 


(CT) 1 0, (CA) 

1 7 
I Z 


245 


— 250 


3 


0.4989 


0.2857 


0.447 


LtlDUj 3 


ATA CCC A C A CC TATr TC CC C 
A l/-\^L^r\Lir\L.L. 1 r\l Li 1 LiLil_Li 


a ATrrrrTrrr Ar ATTrTTT 

AAI LjLjLjL, 1 L^LiLjALiAI 1 L^ 1 1 1 


(TC\ 1 A 


225 


-240 




U.Z I Db 


n 1 A 0 Q 
U. I 4 Z " 


n o n c: 1 

U.ZUD 1 


LHdUo4 


ATArrrr 1 ArrrrA a at — rrT — r 
Al ALjCL^UAL.L.CL,AAAI IL.I 1 


rrAfAffA AT ArAT/" A ATA r ~Tr 

UUAL,AOL,AALiAL,AI UAALjALj 1 Lj 


( An — rr"\ c 
lALi I I U ) b 


148 


-1 55 


r 

3 


U./ I b4 


u,/b I 7 


A C 7 Q 3 

u.b / oo 


bnDUb 1 


(T'KCS.rT'b. Ar KCC Arr A~I — TA 

L.L.ACAUL.AAL,AL^L-AL,L,A1 IA 


Tr ATrr A~rrr a Axr a Arr a a 
1 UAI L.L,A1 L,L,AAI LiAALiL,AA 


/"rArrAA^ c 
l^LALiLAAJ b 


1 50 


-200 


A 

4 


A C C Q 

U.D bo 


A C 3 Q Q 
U.DZ 3 O 


A /| Q7C 

u.4o / b 


c ui en c 7 
brlDUb o 


cc &cctcc~\ — rrTrTTATA Arr 

LLALiL 1 ULj 1 IUIUI IAUAALiLi 


r ArrTr ATrTTrr Ar r r ArTT 

LiALjL, 1 L,AIL- 1 1 L,L,ALiLiLjAL. 1 1 


ri — r"\ 1 7 
^L, I I ) I z 


160 


-270 


A 


U.b 3 DD 


A /IT QC 
U.4Z ob 


a enc 
U.bUD 


CURAt: r 
trlDUD D 


CC ACTC Arr ArACCC ATA AT 
V_^_ALi 1 LiALiL^ALALiLj^AI AAI 


Trr Ar ArTrrao-ATr a ATrr 

1 LjLiALjALi 1 Li^_dgAI LiAAl LjIw 


( a at"! 1 n 

l^AAl J I U 


300 


-350 


3 
3 


A C/1 AO 
U. 3 4 U O 


A /I 7 Q c 
U.4 Z o b 


A ACLAQ 
U.4 b 4 o 


EHB069 


CCCATTTCTACAACACACACTTTC 


TGCTAGGGCCTTGTCGATAC 


(AAAAAT) 5 


1 00 


— 110 


4 


0.5884 


0.2381 


0.5459 


EHB070 


CCCCACATGCGATTTAACTT 


TGCCCTGTGTTGTGCTATTC 


(AAG) 1 0 


230 


— 250 


5 


0.7494 


0.4762 


0.7051 


EHB079 


CCTATCCTTCTGCTCGTTCG 


TTTCCACAGAAGGGAAGGTG 


(ATC) 1 1 


1 50 


-1 65 


6 


0.6837 


0.6667 


0.64 


EHB081 


CCTCTTGCTCTGAAAGCCAC 


AACCAACCAACTG G G ATCAA 


(CACCGG) 5 


235 


-245 


4 


0.621 3 


0.7619 


0.5455 


EHB085 


CGATTAGGTACCTGATCCCA 


AAGTTGTTGAGGAATGATCAGGA 


(TCATGC) 5 


110 


-1 20 


6 


0.7472 


0.5238 


0.7061 


EHB086 


CGCATCCCAACAAGCTAAAT 


CAGAAAGCAATCACAACACACA 


(TC) 1 0, 
(Gl I I) 7 


245 


-250 


3 


0.1 349 


0.1429 


0.1 3 


EHB087 


CGGAGCTAAGTTCGAGTCCTT 


CTGGAACCGTATTTCCAGGT 


(ATT) 1 3 


1 80 


— 200 


4 


0.6202 


0.5714 


0.5693 


EHB088 


CGGAGGCTCCAATTAGACAA 


AAGATGGTCTGTGATCGTGCT 


(TGAGT) 7 


1 60 


-250 


4 


0.2959 


0.2381 


0.2825 


EHB1 00 


CTG CCG ATGTG CTCTTCATA 


AAATGAGGTTGGTCGTCGTC 


(GCTTCT) 6, 

1 1 ) o 


240 


— 270 


3 


0.561 2 


0.71 43 


0.465 


C l_l D 1 AQ 


UAAaLiL, 1 AAL.LjLj 1 LjLiAL, 1 LU 


Arr a Axrrr An — i — rr r ~rr ~\ — r 
AL,LiAAI L-ULiAL, 1 1 1 ULi 1 U 1 I 


(A I L, J I U 


252 


-254 


b 


A C Q Q A 

U.by o4 


A C 1 Q 

u.b i y 


A C C 1 3 

u.bb I o 


E UI D 1 1 A 

tnD I I U 


UAAIL,L, 1 uLLAU 1 L1L1L1AL, IA 


r Ar a A.nn\~nncc aapaataa 

LiALjAAULj 1 LjL,L-LiAALiAALiAA 


^ 1 1_ I J I U 


200 


-230 


7 

z 


A A R Q 7 
U.4 3 7 Z 


A CI Q 

U.b 1 y 


A 7 C 7 Q 


C l_l Q 1 1 3 

LHd I I z 


C Ar A~I — rA rr ATCCC ACTCCC 

UAL.A1 1 ACLAI L^L.CAL, 1 \J<J^ 


tc An — r a ccACf'Accc at — rr 
1 LALi 1 IAL,L>\LjL,ALjL,L,AI lu 


( ATA ^1 /I 

IAIAJ I U 


1 80 


-1 88 


7 
Z 


A /I /I /I /I 

U.4444 


A A 1 CO 

U.4 / bz 


A 3 /I C 7 


C l_l □ 1 1 7 
tnD I I o 


rirrrAn — rr ArrTrr aa Ar 

LiAUULAL 1 1 LjAUL, 1 LLAAAL 


rr a ATr r r r a at — i — i — rrn — rr a 

L,LiAAI L,L,LiLiAAI 1 1 1 L, 1 1 LA 


l^UL, I ) I U 


1 70 


-1 75 


4 


n coco 
U.b i5 3" 


A £CC7 

U.bbb / 


A CTAQ 


C l_l □ 1 1 C 
tnD I I D 


c ATr a a rnr a a a Arrr 

UAI LAALL 1 LjAAAALiL^AL^L- 


r Artrm Ar a ATr r Arr Arr 

LiALjtCgaALjAAl L,L-AL-LiALiL. 


f fi — i — r"* q 
I I I J o 


225 


-250 


3 


A C 7 1 C 

u.bo I D 


A 7 Q 1 


A CCQ/I 

U.D 3 o4 


C l_l □ 1 1 Q 
tnD I I o 


rr a a ata ATr r rr Ar rTrTT 

ULAAAI AAI LjLjL^LiALjL-. 1 Li 1 1 


irn — rr ATr rr Ar a Ar a a Ar 

1 UU 1 1 LjAI ULiL,ALjAAL,AAALj 




160 


-1 78 


/I 
4 


U.DouZ 


A 7 Q 1 


A C/1 7 


CUR 1 7 a 
tnD I Z U 


r r a ArrrTTrrTrTTr Ar at 

Li^AA<^L^Li 1 1 CL, 1*^1 1 LALAI 


TrrTrrrTr Arr a a Ar ArTr 

1 1 LLLi 1 LALLAAAUAL 1 L. 


l^AOA^ I U 


1 34 


-1 40 


7 


A /I A Q 7 
U.4 U o Z 


A A 7C.7 
U.4 / b Z 


A 5 9/1 Q 
U.O Z 4 7 


EHB1 22 


GCATGaTTGGGAAACCAGAT 


GAGTCAACCTGGAAATTAGCG 


(TCT) 1 3 


230 


-250 


4 


0.61 34 


0.5238 


0.5486 


EHB125 


G CTTCCAGTCCACAAAG CAG 


TCATCAGACAGAAATAGTAATAGCCG 


(TC) 9, (AC) 9 


1 52 


-1 54 


4 


0.51 81 


0.381 


0.442 


EHB126 


GCTTCCTCTTTCCGTGTTTG 


ACCAATTGAAAGGCACTGCT 


(ATTTC) 6 


1 1 5 


-1 40 


4 


0.5941 


0.61 9 


0.5396 


EHB1 27 


GGAAATTCTGCTGGCACTGT 


TCGTGACCCAACAGAATAAAGA 


(ATTA) 8 


190 


-220 


4 


0.6723 


0.6667 


0.61 


EHB1 33 


GGCCATCACTCAACATCCTT 


CTCACCCI 1 ITGAAAGCGAA 


(CTT) 1 0 


210 


-225 


3 


0.51 59 


0.5238 


0.4233 


EHB1 35 


GGGGACGCTTCATGGTAGTA 


ACTTGTCAATTGGTGGCACA 


(TTA) 1 2 


1 14 


-1 25 


2 


0.3628 


0.4762 


0.297 



Continued 



Table 3. Continued 



M 3 r kc r 


Forwsrd sequence 


Reverse sequence 


Repeat motif 


PCR product 


INUIIIUCI Ol 


Expected 


Observed 


PIP 








and count 


size (bp) 


alleles 


heterozygosity 


heterozygosity 




EHB1 36 


GGGTATGGATGTGGTGAAGG 


ATGGTTTGGTTCTCATCCCA 


(GTG) 1 0 


253-255 


4 


0.6224 


0.61 9 


0.5605 


EHB140 


GGTAGAGGTTTGGAGGGGAG 


TGATGGCAGCTATGCTGAAC 


(TGAGAC) 5 


140-1 53 


4 


0.602 


0.5238 


0.5528 


EHB1 43 


G GTG GTAAAAGTGG CAATG G 


CTCCATTTTGTCACCACCACT 


(TGG) 8 


1 75-220 


3 


0.5442 


0.8095 


0.4393 


EHB1 44 


GGTTCTTTGCCGGATCTACA 


CTgG G G CATC AG AG ATTTGT 


(CAG) 6, (AAG) 
6 


1 60-1 82 


3 


0.2098 


0.2381 


0.1 878 


EHB148 


GG I I I TCAAAATCTTTTCTATACATCC 


TGCAGAAGCATCAACAAACC 


(ATT) 1 6 


1 60-1 80 


6 


0.7392 


0.5238 


0.7014 


EHB1 51 


GTCCGGTGAAATGAGATGCT 


AGGCGGAAACAGACTCTGAA 


(ATT) 1 5 


225-245 


3 


0.3571 


0.381 


0.3254 


EHB1 57 


GTTGGCCTGGTCAATCTCAT 


GATTAATTCAGTGGTGGCGG 


(CCCAAT) 5 


200-21 5 


2 


0.2778 


0.3333 


0.2392 


EHB1 59 


TACCAAGCATGTTGCCCATA 


TCTCAGAAACAAGGGTTGGG 


(CA) 1 7 


1 85-21 0 


5 


0.7063 


0.6667 


0.6497 


EHB1 60 


TAGAAGCTGCCCACAATGC 


TTGACGCCAAATG I I IATGC 


(AAT) 1 3 


21 0-235 


6 


0.6338 


0.7143 


0.581 7 


EHB1 61 


TAGGATGAGG 111! GGCTGC 


TGGCTCCTTGAAACTGCTCT 


(CATCGT) 5 


250-270 


3 


0.4751 


0.3333 


0.3826 


EHB1 68 


TCAAGCGCATCACACGTATC 


TGGTCACCGAACAACAACAT 


(TCA) 1 0 


1 1 8-1 20 


3 


0.4524 


0.3333 


0.3845 


EHB1 69 


TCACTTTTCACAACCCACCA 


GGCAAACCAGGAAATCAACA 


(TCT) 1 1 


200-225 


4 


0.7336 


0.61 9 


0.6845 


EHB1 77 


TCGCTTTCTCCATATAGAGTTTCA 


CAGCAAGAAATCCCTCAACC 


(GAA) 7, (TTC) 
8 


209-21 2 


6 


0.71 88 


0.8095 


0.6674 


EHB1 78 


TCGTGACCCAACAGAATAAAGA 


GGAAATTCTGCTGGCACTGT 


(ATTA) 8 


1 90-21 5 


4 


0.7302 


0.61 9 


0.68 


EHB1 90 


TGATCCCAAGAACTAGCTTGC 


TAGGAATGGTACCGACCCAC 


(TCATGC) 7 


1 30-1 40 


4 


0.6655 


0.5714 


0.6043 


EHB1 97 


TGGAAGTGAGAATCAcGGTTT 


CGAAGACTTGTGTCAGCAGC 


(GAT) 7, (GAG) 
7 


250-258 


4 


0.61 34 


0.381 


0.5486 


EHB1 98 


TGGCATTCCCACTAATTCAA 


CGGTGGAAATGCTAAGCTGT 


(AAACCAG) 5 


1 94-200 


5 


0.7245 


0.8095 


0.6754 


Mean 










3.851 1 


0.5594 


0.5076 


0.5026 
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0.53 



0.63 

Genetic Similarity 



0.74 
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0.84 



Figure 3. Similarity relationships of 20 different H. brasiliensis clones based on 47 EST-SSR loci. 
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Figure 4. The genetic linkage map of rubber tree F1 population (H. brasiliensis) developed from EST-SSR and SSR markers. The map is 
composed of 97 loci covering 842.9 cM on 23 linkage groups. 
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research institutes. In one branch of Group II, RRII 203, 
RRIT21 and RRIC100 were clustered together and 
shared PB86 as one of the parents. Clone RRIT21, 
which is a descendant of PB86 x RRIT1 3, was closely 
grouped togetherwith RRIT1 3 with a genetic similarity 
of 0.66. Another branch of Group II contained descen- 
dants of the Tjir 1 clone (RRIM600, RRI1 1 05 and 
RRIM703). Although RRIM605 has Tjir 1 as a female 
parent, RRIM605 was classified in Group 1 because it 
shares the same male parent (PB49) with PB260. 
Since these markers are able to reproduce the relation- 
ship that was already known from pedigree infor- 
mation, they can be used to provide reliable 
genotypic information for clonal identification and 
the selection of parents in breeding programs. The 
minimal set of five highly informative microsatellites 
(EHB85, EHB109, EHB1 69, EHB1 77 and EHB178) 
was able to distinguish each of the H. brasiliensis 
clones included in this study. 

Genotyping of the 93 EST-SSR makers that were 
amplifiable in H. brasiliensis was performed across 
genera with two accessions of A/I. esculenta, three acces- 
sions of J. curcas and one accession of /. gossypifolia, all 
belonging to the same family as H. brasiliensis, 
Euphorbiaceae. The results showed that 47 of 93 
primer pairs (40%) gave successful amplifications in 
Manihot species. Fourteen primer pairs (1 5%) and 
nine primer pairs (10%) were successfully amplified 
in J. curcas and J. gossypifolia, respectively. The higher 
rate of cross-transferability in Manihot species com- 
pared with that in Jatropha species suggests a closer 
relationship of Hevea with Manihot than with 
Jatropha. Six primer pairs, EHB61, EHB63, EHB85, 
EHB1 1 5, EHB1 1 6 and EHB1 56, were able to amplify 
unique products from all plant taxa tested. 

3.5. Genetic linkage map 

From 323 novel EST-SSR primer pairs amplifiable in 
H. brasiliensis, 59 primer pairs were polymorphic 
between RRIM600 and RRII105 parental clones. All 
of these polymorphic primers were used to genotype 
with 81 individual Ft samples together with 98 pub- 
lished SSR markers. Genotypic data were scored and 
subjected to linkage analysis. Of 157 polymorphic 
markers, 1 24 markers (78.9%) revealed the expected 
Mendelian segregation ratio and used to construct the 
genetic linkage map. The Ft map consisted of 97 loci 
distributed on 23 linkage groups. Of these, 3 7 loci 
were novel EST-SSRs. The total map distance covered 
842.9 cM with a mean interval of 1 1 .9 cM and the 
average loci per linkage group were approximately 
four loci (Fig. 4). The number of linkage groups 
exceed the expected haploid number of linkage 
groups (18 linkage groups), suggesting that more 
markers are required to fill the gap between adjacent 



Table 4. Comparison of common SSR marker positions between 
linkage maps 



Common SSR 
loci 


Position in Le Guen 
et a I.' 6 


Position in this 
study 


mHbCIRTAs2557 


LG1 


(96.7 CM) 


LG2 3 (9.9 cM) 


mHbCIRTAs2 51 0 


LG2 


(48.2 cM) 


LG6 (42.6 cM) 


mHbCIRTAs2 1 86 


LG5 


(1 9 cM) 


LG5 (22 cM) 


mHbCIRTAs2603 


LG5 


(97.6 cM) 


LG5 (84.1 cM) 


mHbCIRT67 


LG8 


(106.6cM) 


LG4 (97.8 cM) 


mHbCIRTAs2260 


LG1 1 (1 5.6 cM) 


LG1 0 (35.9 cM) 


mHbCIRa268 


LG1 1 (24.7 cM) 


LG1 0 (1 9.5 cM) 


mHbCIRA2736 


LG1 1 (34.1 CM) 


LG1 0 (8.1 cM) 


mHbCIRA2536 


LG1 1 (38.3 CM) 


LG1 0 (0 CM) 


mHbCIRa282 


LG14 (19 cM) 


LG1 (80.3 cM) 


mHbCIRA2435 


LG14 (53.9 cM) 


LG1 (7.1 cM) 


mHbCIRA2423 


LG14 (58 CM) 


LG1 (OcM) 


mHbCIRA2298 


LG9 


(9.5 cM) 


LG8 (20.9 cM) 


mHbCIRa104 


LG9 


(54 cM) 


LG8 (0 cM) 


mHbCIRA2432 


LG9 


(1 1 9.9 cM) 


LG7 (36.9 cM) 


mHbCIRTAs2225 


LG16 (OcM) 


LG1 1 (42.5 CM) 


mHbCIRal 31 


LG1 6 (6.3 cM) 


LG1 1 (31 cM) 


mHbCIRA241 0 


LG1 6 (1 01 .1 cM) 


LG1 5 (7.1 cM) 


mHbCIRA463 


LG1 I 


3 (44 cM) 


LG1 3 (0 CM) 


mHbCIRA320 


LG1 ! 


3 (46.4 cM) 


LG1 3 (3.9 cM) 


mHbCIRA2409 


LG1 ! 


3 (54.5 CM) 


LG1 3 (1 9.9 cM) 


mHbCIRAs221 7 


LG1 ! 


3 (64. cM) 


LG1 3 (29 cM) 


mHbCIRT373 


LG1 ! 


3 (94 cM) 


LG 1 2 (0 CM) 


mHbCIRA2439 


LG1 ! 


3 (1 00.6 cM) 


LG1 2 (3.8 CM) 


mHbCIRTAs2 744 


LG1 ! 


3 (1 09.5 cM) 


LG1 2 (6.5 cM) 



markers. Moreover, the comparison between the F] 
map in this study and the map constructed by Le 
Guen et al. }6 using different parents revealed 25 
common markers on nine homologous linkage 
groups between both maps (Table 4). A total of 
seven marker intervals of 20 markers showed co-line- 
arity between homologous linkage groups. Some 
linkage groups in this study (LG7-LG8, LG11-LG15 
and LG12-LG13) could be joined together based on 
the linked markers in Le Guen et al) b The common 
and collinear markers indicated the reliability 
between different maps. 62 Therefore, the Ft map of 
this study is appropriate for further studies in 
marker-assisted selection. 

Supplementary data: Supplementary data are 
available at www.dnaresearch.oxfordjournals.org. 
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